In this project, I'm building pipelines to classify the traffic signs from the German Traffic Sign Benchmarks.
I'm using the scikit-learn's pipeline framework in order to train the model with various combinations of transformations and estimators.
The starting model is a convolutional network based on the LeNet architecture by Yann LeCun. LeNet was originally designed for handwritten and machine-printed character recognition.
The project is explained in the following sections.
The traffic sign images were taken from the German Traffic Sign Benchmarks.
> mkdir data
> # use wget or curl
> wget http://benchmark.ini.rub.de/Dataset/GTSRB_Final_Training_Images.zip
> unzip GTSRB_Final_Training_Images.zip
> mv GTSRB/Final_Training data/
> wget http://benchmark.ini.rub.de/Dataset/GTSRB_Final_Test_Images.zip
> unzip GTSRB_Final_Test_Images.zip
> mv GTSRB/Final_Test/ data/
> wget http://benchmark.ini.rub.de/Dataset/GTSRB_Final_Test_GT.zip
> unzip GTSRB_Final_Test_GT.zip
> mv GT-final_test.csv data/Final_Test/Images/
The training images are organized in folders by category. Each folder is meant for one category (i.e. stop sign) and has a label file (.csv) which is actually semi-colon delimited (not comma delimited).
data
+ Final_Training
+ Images
+ 00000
+ 00000_00000.ppm
+ 00000_00001.ppm
...
+ GT-00000.csv
+ 00001
+ 00000_00000.ppm
+ 00000_00001.ppm
...
+ GT-00001.csv
...
All images are stored in the PPM format (Portable Pixmap, P6). You'll need to install both matplotlib and pillow to handle such image format. If you use one of the evironments yml files in this repository, this will be taken care of.
import glob
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import cv2
import os
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import tensorflow as tf
import tensorflow as tf
from tensorflow.contrib.layers import flatten
from pipeline import NeuralNetwork, make_adam, Session, build_pipeline
matplotlib.style.use('ggplot')
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
All train image paths are combined into one dataframe for convenience.
TRAIN_IMAGE_DIR = 'data/Final_Training/Images'
dfs = []
for train_file in glob.glob(os.path.join(TRAIN_IMAGE_DIR, '*/GT-*.csv')):
folder = train_file.split('/')[3]
df = pd.read_csv(train_file, sep=';')
df['Filename'] = df['Filename'].apply(lambda x: os.path.join(TRAIN_IMAGE_DIR, folder, x))
dfs.append(df)
train_df = pd.concat(dfs, ignore_index=True)
train_df.head()
Annotation format
The following points are worth mentioning:
Later on, I'll examine sample images to further clarify those points.
There are 43 traffic sign classes in 39,209 training images.
N_CLASSES = np.unique(train_df['ClassId']).size # keep this for later
print("Number of training images : {:>5}".format(train_df.shape[0]))
print("Number of classes : {:>5}".format(N_CLASSES))
The distribution of classes are very skewed.
def show_class_distribution(classIDs, title):
"""
Plot the traffic sign class distribution
"""
plt.figure(figsize=(15, 5))
plt.title('Class ID distribution for {}'.format(title))
plt.hist(classIDs, bins=N_CLASSES)
plt.show()
show_class_distribution(train_df['ClassId'], 'Train Data')
The name of each sign are stored in sign_names.csv file. We can use it see the distribution per sign names.
sign_name_df = pd.read_csv('sign_names.csv', index_col='ClassId')
sign_name_df.head()
sign_name_df['Occurence'] = [sum(train_df['ClassId']==c) for c in range(N_CLASSES)]
sign_name_df.sort_values('Occurence', ascending=False)
The following constant is defined for later use.
SIGN_NAMES = sign_name_df.SignName.values
SIGN_NAMES[2]
Let's examine some random images:
def load_image(image_file):
"""
Read image file into numpy array (RGB)
"""
return plt.imread(image_file)
def get_samples(image_data, num_samples, class_id=None):
"""
Randomly select image filenames and their class IDs
"""
if class_id is not None:
image_data = image_data[image_data['ClassId']==class_id]
indices = np.random.choice(image_data.shape[0], size=num_samples, replace=False)
return image_data.iloc[indices][['Filename', 'ClassId']].values
def show_images(image_data, cols=5, sign_names=None, show_shape=False, func=None):
"""
Given a list of image file paths, load images and show them.
"""
num_images = len(image_data)
rows = num_images//cols
plt.figure(figsize=(cols*3,rows*2.5))
for i, (image_file, label) in enumerate(image_data):
image = load_image(image_file)
if func is not None:
image = func(image)
plt.subplot(rows, cols, i+1)
plt.imshow(image)
if sign_names is not None:
plt.text(0, 0, '{}: {}'.format(label, sign_names[label]), color='k',backgroundcolor='c', fontsize=8)
if show_shape:
plt.text(0, image.shape[0], '{}'.format(image.shape), color='k',backgroundcolor='y', fontsize=8)
plt.xticks([])
plt.yticks([])
plt.show()
The below are 20 random sample images from the train set.
sample_data = get_samples(train_df, 20)
show_images(sample_data, sign_names=SIGN_NAMES, show_shape=True)
print(SIGN_NAMES[2])
show_images(get_samples(train_df, 100, class_id=2), cols=20, show_shape=True)
Looking at the sample images, the following image characteristics are confirmed:
The first point will be handled in the image pre-processing, and the remaining points will be handled in the image augmentation.
Train and validation data set are created from the training data.
X = train_df['Filename'].values
y = train_df['ClassId'].values
print('X data', len(X))
X_train, X_valid, y_train, y_valid = train_test_split(X, y, stratify=y, test_size=8000, random_state=0)
print('X_train:', len(X_train))
print('X_valid:', len(X_valid))
The model is based on LeNet by Yann LeCun. It is a convolutional neural network designed to recognize visual patterns directly from pixel images with minimal preprocessing. It can handle hand-written characters very well.

Source: http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf
Our model is adapted from the LeNet as follows.
| Layer | Shape |
|---|---|
| Input | 32x32x3 |
| Convolution (valid, 5x5x6) | 28x28x6 |
| Max Pooling (valid, 2x2) | 14x14x6 |
| Activation (ReLU) | 14x14x6 |
| Convolution (valid, 5x5x16) | 10x10x16 |
| Max Pooling (valid, 2x2) | 5x5x16 |
| Activation (ReLU) | 5x5x16 |
| Flatten | 400 |
| Dense | 120 |
| Activation (ReLU) | 120 |
| Dense | 43 |
| Activation (Softmax) | 43 |
The NeuralNetwork class is defined to provide common operations in neural network using Tensorflow. See the network.py for details.
The first network (based on LeNet) is defined as follows:
INPUT_SHAPE = (32, 32, 3)
def make_network1(input_shape=INPUT_SHAPE):
return (NeuralNetwork()
.input(input_shape)
.conv([5, 5, 6])
.max_pool()
.relu()
.conv([5, 5, 16])
.max_pool()
.relu()
.flatten()
.dense(120)
.relu()
.dense(N_CLASSES))
We are using the scikit-learn's pipeline framework to handle various pipeline scenarios. See pipeline.py for details.
Once made, a pipeline can be trained and evaluated using the function below:
def train_evaluate(pipeline, epochs=5, samples_per_epoch=50000, train=(X_train, y_train), test=(X_valid, y_valid)):
"""
Repeat the training for the epochs and evaluate the performance
"""
X, y = train
learning_curve = []
for i in range(epochs):
indices = np.random.choice(len(X), size=samples_per_epoch)
pipeline.fit(X[indices], y[indices])
scores = [pipeline.score(*train), pipeline.score(*test)]
learning_curve.append([i, *scores])
print("Epoch: {:>3} Train Score: {:.3f} Evaluation Score: {:.3f}".format(i, *scores))
return np.array(learning_curve).T # (epochs, train scores, eval scores)
Let's train a network using the first network. This performance is our initial benchmark.
def resize_image(image, shape=INPUT_SHAPE[:2]):
return cv2.resize(image, shape)
loader = lambda image_file: resize_image(load_image(image_file))
with Session() as session:
functions = [loader]
pipeline = build_pipeline(functions, session, make_network1(), make_adam(1.0e-3))
train_evaluate(pipeline)
Observation:
This proves the network is working properly. The performance is pretty good for the barebone network.
I can see a bit of overfitting. This is likely because the network is exposed to the same images over and over since I'm using 5 epochs (50K samples per epoch). At this moment, it is good to see the network is able to overfit and not showing high biases. The network can handle these images and able to learn from the data.
As the training set has very skewed distribution, if I simply increases the epochs or samples per epoch, the network will overfit to the training set. Hence, we should generate more training data using image augmentation.
def random_brightness(image, ratio):
"""
Randomly adjust brightness of the image.
"""
# HSV (Hue, Saturation, Value) is also called HSB ('B' for Brightness).
hsv = cv2.cvtColor(image, cv2.COLOR_RGB2HSV)
brightness = np.float64(hsv[:, :, 2])
brightness = brightness * (1.0 + np.random.uniform(-ratio, ratio))
brightness[brightness>255] = 255
brightness[brightness<0] = 0
hsv[:, :, 2] = brightness
return cv2.cvtColor(hsv, cv2.COLOR_HSV2RGB)
def random_rotation(image, angle):
"""
Randomly rotate the image
"""
if angle == 0:
return image
angle = np.random.uniform(-angle, angle)
rows, cols = image.shape[:2]
size = cols, rows
center = cols/2, rows/2
scale = 1.0
rotation = cv2.getRotationMatrix2D(center, angle, scale)
return cv2.warpAffine(image, rotation, size)
def random_translation(image, translation):
"""
Randomly move the image
"""
if translation == 0:
return 0
rows, cols = image.shape[:2]
size = cols, rows
x = np.random.uniform(-translation, translation)
y = np.random.uniform(-translation, translation)
trans = np.float32([[1,0,x],[0,1,y]])
return cv2.warpAffine(image, trans, size)
def random_shear(image, shear):
"""
Randomly distort the image
"""
if shear == 0:
return image
rows, cols = image.shape[:2]
size = cols, rows
left, right, top, bottom = shear, cols - shear, shear, rows - shear
dx = np.random.uniform(-shear, shear)
dy = np.random.uniform(-shear, shear)
p1 = np.float32([[left , top],[right , top ],[left, bottom]])
p2 = np.float32([[left+dx, top],[right+dx, top+dy],[left, bottom+dy]])
move = cv2.getAffineTransform(p1,p2)
return cv2.warpAffine(image, move, size)
def augment_image(image, brightness, angle, translation, shear):
image = random_brightness(image, brightness)
image = random_rotation(image, angle)
image = random_translation(image, translation)
image = random_shear(image, shear)
return image
augmenter = lambda x: augment_image(x, brightness=0.7, angle=10, translation=5, shear=2)
show_images(sample_data[10:], cols=10) # original
for _ in range(5):
show_images(sample_data[10:], cols=10, func=augmenter)
with Session() as session:
functions = [loader, augmenter]
pipeline = build_pipeline(functions, session, make_network1(), make_adam(1.0e-3))
train_evaluate(pipeline)
Obervation:
The hyper-perameters like brightness, rotation, translation, shear parameters are manually tuned by looking at the randomly altered images. If the alteration is too great, it is not realistic. The same way that horizontal flip is not included, too big change like rotating 90 degree should not be used.
The performance with the augmentation is much worse than without it. There are two possible reasons:
Let's see how other preprocessing can improve the performance first. I hope the normalization and other technique will make the learning easier for the network. Once that's done, I will use much bigger epochs to properly measure the performance.
The below will test various normalization technique to see which one has the best performance.
normalizers = [('x - 127.5', lambda x: x - 127.5),
('x/127.5 - 1.0', lambda x: x/127.5 - 1.0),
('x/255.0 - 0.5', lambda x: x/255.0 - 0.5),
('x - x.mean()', lambda x: x - x.mean()),
('(x - x.mean())/x.std()', lambda x: (x - x.mean())/x.std())]
for name, normalizer in normalizers:
print('Normalizer: {}'.format(name))
with Session() as session:
functions = [loader, augmenter, normalizer]
pipeline = build_pipeline(functions, session, make_network1(), make_adam(1.0e-3))
train_evaluate(pipeline)
print()
Observaton:
The performance with any of the normalizations is better than without them. This clearly shows the importance of the normalizations. In this experiment, the normalization with (x-x.mean())/x.std() produced the best performance. The performance actually varies randomly at every run. So, it is not easy to say which one is better than what. In any case, (x-x.mean())/x.std() wins by far.
There are more techniques like using an average image of all training data, etc, which I may try later on. But for now, I will use the best performing normalization for the rest of the experiment.
normalizer = lambda x: (x - x.mean())/x.std()
Now, we'll try difference color space to see if there is any performance gain.
Note: Gray scale has only one channel so it needs to be handled separately.
Color Code Reference:
# for Gray scale, we need to add the 3rd dimension back (1 channel) as it's expected by the network
converters = [('Gray', lambda x: cv2.cvtColor(x, cv2.COLOR_RGB2GRAY)[:, :, np.newaxis]),
('HSV', lambda x: cv2.cvtColor(x, cv2.COLOR_RGB2HSV)),
('HLS', lambda x: cv2.cvtColor(x, cv2.COLOR_RGB2HLS)),
('Lab', lambda x: cv2.cvtColor(x, cv2.COLOR_RGB2Lab)),
('Luv', lambda x: cv2.cvtColor(x, cv2.COLOR_RGB2Luv)),
('XYZ', lambda x: cv2.cvtColor(x, cv2.COLOR_RGB2XYZ)),
('Yrb', lambda x: cv2.cvtColor(x, cv2.COLOR_RGB2YCrCb)),
('YUV', lambda x: cv2.cvtColor(x, cv2.COLOR_RGB2YUV))]
GRAY_INPUT_SHAPE = (*INPUT_SHAPE[:2], 1)
for name, converter in converters:
print('Color Space: {}'.format(name))
with Session() as session:
functions = [loader, augmenter, converter, normalizer]
if name == 'Gray':
network = make_network1(input_shape=GRAY_INPUT_SHAPE) # there is only one channel in gray scale
else:
network = make_network1()
pipeline = build_pipeline(functions, session, network, make_adam(1.0e-3))
train_evaluate(pipeline)
print()
Observaton:
RGB (no conversion) is the best which surprised me. I was expecting the gray scale to be more efficient as the traffic signs are mostly about shapes not colors. The gray scale would have reduced the dimensionality from 3 color channels to 1, which would make the learning faster/easier. Apparently, that is not the case.
Also, I was thinking that the colors in traffic signs are more saturated than that of backgrounds (i.e., trees), and the color space like HSV and HLS might contribute to superior performance. This was not the case. Also, now that I saw the result, I realized that I should not assume anything about the background colors.
On a separate note, I noticed that whenever I ran this cell, the result seems slightly different. For example, gray scale or XYZ sometimes shows better performance than RGB (no conversion). This could be due to the randomness of image augmentation. But most of the times, RGB (no conversion) is the best. If I needed to analyse this further, I'd need to check the histogram of different channels and the performance for each color space.
But that's the kind of things the network should automatically figure out (i.e., automatic feature engineering). So, I should not mess with color space at least for now. It makes the pipeline simpler, too.
I'm done with preprocessing part.
preprocessors = [loader, augmenter, normalizer]
I want to try the following to see if I can improve the performance while not causing any overfit.
elu instead of reluThe below is to plot the learning curve.
def show_learning_curve(learning_curve):
epochs, train, valid = learning_curve
plt.figure(figsize=(10, 10))
plt.plot(epochs, train, label='train')
plt.plot(epochs, valid, label='validation')
plt.title('Learning Curve')
plt.ylabel('accuracy')
plt.xlabel('epochs')
plt.xticks(epochs)
plt.legend(loc='center right')
These functions are for plotting and printing the confusion matrix.
def plot_confusion_matrix(cm):
cm = [row/sum(row) for row in cm]
fig = plt.figure(figsize=(10, 10))
ax = fig.add_subplot(111)
cax = ax.matshow(cm, cmap=plt.cm.Oranges)
fig.colorbar(cax)
plt.title('Confusion Matrix')
plt.xlabel('Predicted Class IDs')
plt.ylabel('True Class IDs')
plt.show()
def print_confusion_matrix(cm, sign_names=SIGN_NAMES):
results = [(i, SIGN_NAMES[i], row[i]/sum(row)*100) for i, row in enumerate(cm)]
accuracies = []
for result in sorted(results, key=lambda x: -x[2]):
print('{:>2} {:<50} {:6.2f}% {:>4}'.format(*result, sum(y_train==result[0])))
accuracies.append(result[2])
print('-'*50)
print('Accuracy: Mean: {:.3f} Std: {:.3f}'.format(np.mean(accuracies), np.std(accuracies)))
Doubling all filters in the convolutional layers and neurons in the dense layers.
def make_network2(input_shape=INPUT_SHAPE):
return (NeuralNetwork()
.input(input_shape)
.conv([5, 5, 12]) # <== doubled
.max_pool()
.relu()
.conv([5, 5, 32]) # <== doubled
.max_pool()
.relu()
.flatten()
.dense(240) # <== doubled
.relu()
.dense(N_CLASSES))
with Session() as session:
pipeline = build_pipeline(preprocessors, session, make_network2(), make_adam(1.0e-3))
learning_curve = train_evaluate(pipeline)
session.save('checkpoint/network2.ckpt')
show_learning_curve(learning_curve)
with Session() as session:
pipeline = build_pipeline(preprocessors, session, make_network2())
session.load('checkpoint/network2.ckpt')
pred = pipeline.predict(X_valid)
# examine confusionconfusion_matrixix
cm = confusion_matrix(y_valid, pred)
plot_confusion_matrix(cm)
print_confusion_matrix(cm)
Observation:
The performance improved. The training accuracy is slightly higher than the validation accuracy. It might be a sign of overfitting but I'll need to see by increasing the complexity of the network.
On a separate note, I could have tried changing layer by layer but changing all three did work so I'm ok with this.
The confusion matrix's mean accuracy is the sum of the mean accuracy for each class divided by the number of class. It is lower than overall accuracy indicating the larger classes are performing better (or the smaller classes are performing worse).
Futher doubling all filters in the convolutional layers and neurons in the dense layers.
def make_network3(input_shape=INPUT_SHAPE):
return (NeuralNetwork()
.input(input_shape)
.conv([5, 5, 24]) # <== doubled
.max_pool()
.relu()
.conv([5, 5, 64]) # <== doubled
.max_pool()
.relu()
.flatten()
.dense(480) # <== doubled
.relu()
.dense(N_CLASSES))
with Session() as session:
pipeline = build_pipeline(preprocessors, session, make_network3(), make_adam(1.0e-3))
learning_curve = train_evaluate(pipeline)
session.save('checkpoint/network3.ckpt')
show_learning_curve(learning_curve)
with Session() as session:
pipeline = build_pipeline(preprocessors, session, make_network3())
session.load('checkpoint/network3.ckpt')
pred = pipeline.predict(X_valid)
# examine confusionconfusion_matrixix
cm = confusion_matrix(y_valid, pred)
plot_confusion_matrix(cm)
print_confusion_matrix(cm)
Observation:
The performance is better. It may be showing a slight overfitting. But I don't think we need to apply any regularization at this stage. I should rather try more epochs to see how far it can improve.
For almost all classes, the network is producing better than 90% accuracy, proving that increasing the network complexity is making it more robust.
with Session() as session:
pipeline = build_pipeline(preprocessors, session, make_network3(), make_adam(1.0e-3))
learning_curve = train_evaluate(pipeline, epochs=20)
session.save('checkpoint/network3_epochs-20.ckpt')
show_learning_curve(learning_curve)
with Session() as session:
pipeline = build_pipeline(preprocessors, session, make_network3())
session.load('checkpoint/network3_epochs-20.ckpt')
pred = pipeline.predict(X_valid)
# examine confusionconfusion_matrixix
cm = confusion_matrix(y_valid, pred)
plot_confusion_matrix(cm)
print_confusion_matrix(cm)
Observation:
The performance did improve but not in the last several epochs.
100% accuracy is achieved for more classes. Also, the bottom performer is improving as well.
Let's try a lower learning rate with epochs=20.
with Session() as session:
pipeline = build_pipeline(preprocessors, session, make_network3(), make_adam(0.5e-3)) # <== lower learning rate
learning_curve = train_evaluate(pipeline, epochs=20)
session.save('checkpoint/network3_epochs-20_lr-0.5e-3.ckpt')
show_learning_curve(learning_curve)
with Session() as session:
pipeline = build_pipeline(preprocessors, session, make_network3())
session.load('checkpoint/network3_epochs-20_lr-0.5e-3.ckpt')
pred = pipeline.predict(X_valid)
# examine confusionconfusion_matrixix
cm = confusion_matrix(y_valid, pred)
plot_confusion_matrix(cm)
print_confusion_matrix(cm)
Observation:
The performance is almost the same or slightly better. The learning curve looks much smoother. The average accuracy per class is slightly better, too. Overall, I believe the smaller learning rate was a worthy change.
with Session() as session:
pipeline = build_pipeline(preprocessors, session, make_network3(), make_adam(1.0e-4)) # <== lower learning rate
learning_curve = train_evaluate(pipeline, epochs=20)
session.save('checkpoint/network3_epochs-20_lr-1.0e-4.ckpt')
show_learning_curve(learning_curve)
with Session() as session:
pipeline = build_pipeline(preprocessors, session, make_network3())
session.load('checkpoint/network3_epochs-20_lr-1.0e-4.ckpt')
pred = pipeline.predict(X_valid)
# examine confusionconfusion_matrixix
cm = confusion_matrix(y_valid, pred)
plot_confusion_matrix(cm)
print_confusion_matrix(cm)
Observation:
Let's stick with the previous learning rate for the time being.
Let's try leaky ReLU (to avoid dead ReLU issue if any)
def make_network4(input_shape=INPUT_SHAPE):
return (NeuralNetwork()
.input(input_shape)
.conv([5, 5, 24])
.max_pool()
.relu(leak_ratio=0.01) # <== leaky ReLU
.conv([5, 5, 64])
.max_pool()
.relu(leak_ratio=0.01) # <== leaky ReLU
.flatten()
.dense(480)
.relu(leak_ratio=0.01) # <== leaky ReLU
.dense(N_CLASSES))
with Session() as session:
pipeline = build_pipeline(preprocessors, session, make_network4(), make_adam(0.5e-3))
learning_curve = train_evaluate(pipeline, epochs=20)
session.save('checkpoint/network4.ckpt')
show_learning_curve(learning_curve)
Observation:
No improvment.
ELU (Exponential Linear Unit) activation which is supposed to be faster to learn than ReLU.
Reference: http://www.picalike.com/blog/2015/11/28/relu-was-yesterday-tomorrow-comes-elu/
def make_network5(input_shape=INPUT_SHAPE):
return (NeuralNetwork()
.input(input_shape)
.conv([5, 5, 24])
.max_pool()
.elu() # <== ELU
.conv([5, 5, 64])
.max_pool()
.elu() # <== ELU
.flatten()
.dense(480)
.elu() # <== ELU
.dense(N_CLASSES))
with Session() as session:
pipeline = build_pipeline(preprocessors, session, make_network5(), make_adam(0.5e-3))
learning_curve = train_evaluate(pipeline, epochs=20)
session.save('checkpoint/network5.ckpt')
show_learning_curve(learning_curve)
Observation:
The permance is worse. Also, it did not learn faster.
Let's try smaller initial weight value.
def make_network6(input_shape=INPUT_SHAPE):
return (NeuralNetwork(weight_sigma=0.01) # <== smaller weight sigma
.input(input_shape)
.conv([5, 5, 24])
.max_pool()
.relu()
.conv([5, 5, 64])
.max_pool()
.relu()
.flatten()
.dense(480)
.relu()
.dense(N_CLASSES))
with Session() as session:
pipeline = build_pipeline(preprocessors, session, make_network6(), make_adam(0.5e-3))
learning_curve = train_evaluate(pipeline, epochs=20)
session.save('checkpoint/network6.ckpt')
show_learning_curve(learning_curve)
Observation:
Not an improvement - a bit worse.
Adding one more dense layer.
def make_network7(input_shape=INPUT_SHAPE):
return (NeuralNetwork()
.input(input_shape)
.conv([5, 5, 24])
.max_pool()
.relu()
.conv([5, 5, 64])
.max_pool()
.relu()
.flatten()
.dense(480)
.relu()
.dense(240) # <== one more dense layer
.relu()
.dense(N_CLASSES))
with Session() as session:
pipeline = build_pipeline(preprocessors, session, make_network7(), make_adam(0.5e-3))
learning_curve = train_evaluate(pipeline, epochs=20)
session.save('checkpoint/network7.ckpt')
show_learning_curve(learning_curve)
Observation:
No improvment - a bit worse.
The same as Network 3 but using MaxPooling after ReLU.
def make_network8(input_shape=INPUT_SHAPE):
return (NeuralNetwork()
.input(input_shape)
.conv([5, 5, 24])
.relu()
.max_pool() # <== after ReLU
.conv([5, 5, 64])
.relu()
.max_pool() # <== after ReLU
.flatten()
.dense(480)
.relu()
.dense(N_CLASSES))
with Session() as session:
pipeline = build_pipeline(preprocessors, session, make_network8(), make_adam(0.5e-3))
learning_curve = train_evaluate(pipeline, epochs=20)
session.save('checkpoint/network8.ckpt')
show_learning_curve(learning_curve)
Observation:
No improvement - about the same.
Let's try 3 convolutional layers.
def make_network9(input_shape=INPUT_SHAPE):
return (NeuralNetwork()
.input(input_shape)
.conv([5, 5, 24])
.max_pool()
.relu()
.conv([5, 5, 64])
.max_pool()
.relu()
.conv([3, 3, 64]) # <= smaller kernel here (the image is small by here)
.max_pool()
.relu()
.flatten()
.dense(480)
.relu()
.dense(N_CLASSES))
with Session() as session:
pipeline = build_pipeline(preprocessors, session, make_network9(), make_adam(0.5e-3))
learning_curve = train_evaluate(pipeline, epochs=20)
session.save('checkpoint/network9.ckpt')
show_learning_curve(learning_curve)
Observation
No improvment - a bit worse.
for momentum in [0.7, 0.8, 0.9]:
with Session() as session:
print('Momentum: {}'.format(momentum))
optimizer = tf.train.MomentumOptimizer(learning_rate=0.5e-3, momentum=momentum)
pipeline = build_pipeline(preprocessors, session, make_network3(), optimizer)
train_evaluate(pipeline, epochs=20)
session.save('checkpoint/network3_momentum_{}.ckpt'.format(momentum))
print()
Observation:
Got worse.
Will it help to have a balanced class distribution of training data?
def balance_distribution(X, y, size):
X_balanced = []
y_balanced = []
for c in range(N_CLASSES):
data = X[y==c]
indices = np.random.choice(sum(y==c), size)
X_balanced.extend(X[y==c][indices])
y_balanced.extend(y[y==c][indices])
return np.array(X_balanced), np.array(y_balanced)
X_balanced, y_balanced = balance_distribution(X_train, y_train, 3000)
show_class_distribution(y_balanced, 'Balanced Train Set')
Let's try the balanced data set with our best pipeline (Network 3 with learning rate = 0.5e-3)
with Session() as session:
pipeline = build_pipeline(preprocessors, session, make_network3(), make_adam(0.5e-3))
learning_curve = train_evaluate(pipeline, epochs=20, train=(X_balanced, y_balanced)) # <== using the balanced train set
session.save('checkpoint/network3_with_balanced_data.ckpt')
show_learning_curve(learning_curve)
Observation:
The validation accuracy is much worse than before. This is likely because the distribution is different, indicating the network is learning the distribution which is different from the validation set. Assuming the test set has the same kind of validation (we should not check the test set at this stage), using the balance set may not be a good idea.
Instead, we should do more epochs so that minor classes are more visible to the network.
Let's just try with much more epochs.
with Session() as session:
pipeline = build_pipeline(preprocessors, session, make_network3(), make_adam(0.5e-3))
learning_curve = train_evaluate(pipeline, epochs=100)
show_learning_curve(learning_curve)
Observation:
It performs much better now but the last several epochs are not really helping for the network to learn. I should try smaller learning rate to see how it goes.
with Session() as session:
pipeline = build_pipeline(preprocessors, session, make_network3(), make_adam(1.0e-4))
learning_curve = train_evaluate(pipeline, epochs=500)
session.save('checkpoint/network3_epochs-500_lr-1.0e-4.ckpt')
show_learning_curve(learning_curve)
Observation:
The performance has improved. It appears that 100 epochs are enough to achieve this performance.
Is this as good as it can get?
Can we make the network more robust? How about a dropout?
def make_network10(input_shape=INPUT_SHAPE):
return (NeuralNetwork()
.input(input_shape)
.conv([5, 5, 24])
.max_pool()
.relu()
.conv([5, 5, 64])
.max_pool()
.relu()
.dropout(keep_prob=0.5)
.flatten()
.dense(480)
.relu()
.dense(N_CLASSES))
with Session() as session:
pipeline = build_pipeline(preprocessors, session, make_network10(), make_adam(1.0e-4))
learning_curve = train_evaluate(pipeline, epochs=500)
session.save('checkpoint/network10.ckpt')
show_learning_curve(learning_curve)
with Session() as session:
pipeline = build_pipeline(preprocessors, session, make_network10())
session.load('checkpoint/network10.ckpt')
pred = pipeline.predict(X_valid)
# examine confusionconfusion_matrixix
cm = confusion_matrix(y_valid, pred)
plot_confusion_matrix(cm)
print_confusion_matrix(cm)
Observation:
The validation performance is more stable now. 500 epochs is probably overkill.
These are extra experiments to see if we can improve the pipeline further. I did experiment with these preprocessing before the network was fully trained and the effect was sort of random (sometimes good, other times bad). As such I discarded the idea.
But I'm re-doing it to see if it has positive effect after the network is fully trained.
The conclusion in short is that they don't work well. This is very likely due to the network trained without them. It seems abundantly clear in hindsight.
The below are a series of unfortunately experiments, proving how bad idea it was to change preprocessing after the training.
Taking a weighted average of the original image and the blurred image to make in order to smooth out the noises.
def enhance_image(image, ksize, weight):
blurred = cv2.GaussianBlur(image, (ksize, ksize), 0)
return cv2.addWeighted(image, weight, blurred, -weight, image.mean())
for ksize in [5, 7, 9, 11]:
for weight in [4, 6, 8, 10]:
print('Enhancer: k={} w={}'.format(ksize, weight))
with Session() as session:
enhancer = lambda x: enhance_image(x, ksize, weight)
functions = [loader, augmenter, enhancer, normalizer]
pipeline = build_pipeline(functions, session, make_network10())
session.load('checkpoint/network10.ckpt')
score = pipeline.score(X_valid, y_valid)
print('Validation Score: {}'.format(score))
print()
enhancer = lambda x: enhance_image(x, 9, 8)
show_images(sample_data[10:], cols=10)
show_images(sample_data[10:], cols=10, func=enhancer)
Observations:
I tried these earlier and the result was pretty random. Sometimes, it improves but not other times.
If I do this with the pre-trained network as shown above, the result is worse as the network is already tuned for images not using the enhancements.
Perhaps, I should try to use it during the training to see if it speed up the training or not.
However, as the network performs really well without this, my conclusion is not to use this filter at all.
def equalizer(image):
image = image.copy()
for i in range(3):
image[:, :, i] = cv2.equalizeHist(image[:, :, i])
return image
show_images(sample_data[10:], cols=10)
show_images(sample_data[10:], cols=10, func=equalizer)
with Session() as session:
functions = [loader, augmenter, equalizer, normalizer]
pipeline = build_pipeline(functions, session, make_network10())
session.load('checkpoint/network10.ckpt')
score = pipeline.score(X_valid, y_valid)
print('Validation Score: {:.3f}'.format(score))
Observation:
If this was tried before the network was fully trained, it might have made the learning easier. Earlier, I played with this preprocessing before training but it was not producing better results than the enhancer. Maybe I will re-try that one day.
How about combine the histogram equalizer and the enhancer?
with Session() as session:
functions = [loader, augmenter, equalizer, enhancer, normalizer]
pipeline = build_pipeline(functions, session, make_network10())
session.load('checkpoint/network10.ckpt')
score = pipeline.score(X_valid, y_valid)
print(score)
Observaton:
The histogram equalizer + enhancer does not improve.
def min_max_norm(image):
return cv2.normalize(image, None, alpha=0, beta=255, norm_type=cv2.NORM_MINMAX)
show_images(sample_data[10:], cols=10)
show_images(sample_data[10:], cols=10, func=min_max_norm)
with Session() as session:
functions = [loader, augmenter, min_max_norm, normalizer]
pipeline = build_pipeline(functions, session, make_network10())
session.load('checkpoint/network10.ckpt')
score = pipeline.score(X_valid, y_valid)
print(score)
Observaton:
I tried Min-Max normalizer before fully training the network and did not get any good results.
This experiment here is just to check if it can do any good once the network is fully trained. It proves the otherwise.
Let's try the combination with the enhancer.
with Session() as session:
functions = [loader, augmenter, min_max_norm, enhancer, normalizer]
pipeline = build_pipeline(functions, session, make_network10())
session.load('checkpoint/network10.ckpt')
score = pipeline.score(X_valid, y_valid)
print(score)
Observation:
The same argument here. I don't need this.
Test images are in one folder. So, we can simply load them as follows:
Test images do not have category folders but all are kept in one place with one label file.
data
+ Final_Test
+ Images
+ 00000.ppm
+ 00001.ppm
+ ...
+ GT-final_test.csv # Extended annotations including class ids
+ GT-final_test.test.csv
I also downloaded GT-final_test.csv which contains extended annotations including class ids for test images.
TEST_IMAGE_DIR = 'data/Final_Test/Images'
# Note: GT-final_test.csv comes with class IDs (GT-final_test.test.csv does not)
test_df = pd.read_csv(os.path.join(TEST_IMAGE_DIR, 'GT-final_test.csv'), sep=';')
test_df['Filename'] = test_df['Filename'].apply(lambda x: os.path.join(TEST_IMAGE_DIR, x))
test_df.head()
print("Number of test images: {:>5}".format(test_df.shape[0]))
X_test = test_df['Filename'].values
y_test = test_df['ClassId'].values
with Session() as session:
pipeline = build_pipeline(preprocessors, session, make_network3())
session.load('checkpoint/network10.ckpt')
score = pipeline.score(X_test, y_test)
print('Test Score: {}'.format(score))
Observation:
It's about 96% accuracy. For the simple network like this one, it did a really good work.
I probably need more complex network than this one but I'd need a better hardware to train one of those. Even with AWS g2 instance is not fast enough to run 500 epochs.
X_new = np.array(glob.glob('images/sign*.jpg') +
glob.glob('images/sign*.png'))
new_images = [plt.imread(path) for path in X_new]
print('-' * 80)
print('New Images for Random Testing')
print('-' * 80)
plt.figure(figsize=(15,5))
for i, image in enumerate(new_images):
plt.subplot(2,len(X_new)//2,i+1)
plt.imshow(image)
plt.xticks([])
plt.yticks([])
plt.show()
print('getting top 5 results')
with Session() as session:
pipeline = build_pipeline(preprocessors, session, make_network3())
session.load('checkpoint/network10.ckpt')
prob = pipeline.predict_proba(X_new)
estimator = pipeline.steps[-1][1]
top_5_prob, top_5_pred = estimator.top_k_
print('done')
print('-' * 80)
print('Top 5 Predictions')
print('-' * 80)
for i, (preds, probs, image) in enumerate(zip(top_5_pred, top_5_prob, new_images)):
plt.imshow(image)
plt.xticks([])
plt.yticks([])
plt.show()
for pred, prob in zip(preds.astype(int), probs):
sign_name = SIGN_NAMES[pred]
print('{:>5}: {:<50} ({:>14.10f}%)'.format(pred, sign_name, prob*100.0))
print('-' * 80)
Observation:
7 out of 10 are correct.
I can understand why it did not identify the pedestrian correctly as it's not a German traffic sign. But it's quite similar to it. So, a human would have recognized this. This means that the machine learned only the shapes but not the concept, which is understandable from the way convolutional neural network works.
This also means that for every country / region, we'd need to train the classifier to recognize their traffic signs.
As for the speed limits (80km and 100km), I'm thinking it is may be due to the image distortion by the resizing operation. We may need a better way to resize images. But this is yet to be proven at this stage.
The use of pipelines were very effective during the experimentations. In the end, the traffic sign classifier works pretty well overall with the test set.
However, the network did not work as good with ramdom sample images from the internet.
Moreover, if a self-driving car needs to find traffic signs in public, it first needs to know where the traffic signs are. It's a chicken and egg problem.
Therefore, we will need an object recognition mechanism that scan across the image with sliding windows to find the candidate signs. This kind of detection mechanism is not covered in this project.